Goto

Collaborating Authors

 machine learning classification


Machine Learning Classification of Peaceful Countries: A Comparative Analysis and Dataset Optimization

Lian, K., Liebovitch, L. S., Wild, M., West, H., Coleman, P. T., Chen, F., Kimani, E., Sieck, K.

arXiv.org Artificial Intelligence

This paper presents a machine learning approach to classify countries as peaceful or non-peaceful using linguistic patterns extracted from global media articles. We employ vector embeddings and cosine similarity to develop a supervised classification model that effectively identifies peaceful countries. Additionally, we explore the impact of dataset size on model performance, investigating how shrinking the dataset influences classification accuracy. Our results highlight the challenges and opportunities associated with using large-scale text data for peace studies.


Galactic Component Mapping of Galaxy UGC 2885 by Machine Learning Classification

#artificialintelligence

Automating classification of galaxy components is important for understanding the formation and evolution of galaxies. Traditionally, only the larger galaxy structures such as the spiral arms, bulge, and disc are classified. Here we use machine learning (ML) pixel-by-pixel classification to automatically classify all galaxy components within digital imagery of massive spiral galaxy UGC 2885. Galaxy components include young stellar population, old stellar population, dust lanes, galaxy center, outer disc, and celestial background. We test three ML models: maximum likelihood classifier (MLC), random forest (RF), and support vector machine (SVM). We use high-resolution Hubble Space Telescope (HST) digital imagery along with textural features derived from HST imagery, band ratios derived from HST imagery, and distance layers. Textural features are typically used in remote sensing studies and are useful for identifying patterns within digital imagery. We run ML classification models with different combinations of HST digital imagery, textural features, band ratios, and distance layers to determine the most useful information for galaxy component classification. Textural features and distance layers are most useful for galaxy component identification, with the SVM and RF models performing the best. The MLC model performs worse overall but has comparable performance to SVM and RF in some circumstances. Overall, the models are best at classifying the most spectrally unique galaxy components including the galaxy center, outer disc, and celestial background. The most confusion occurs between the young stellar population, old stellar population, and dust lanes. We suggest further experimentation with textural features for astronomical research on small-scale galactic structures.


Practical considerations for Machine Learning Classification - AskSid

#artificialintelligence

There is something very satisfying when you build a machine learning classifier using a toy dataset. We can achieve high accuracy and feel good inside while doing it. But this doesn't really help us or prepare us for real-world datasets and the issues it poses. If you have ever trained a machine learning classification model, you may have come across this issue. People use different words for it. 'Imbalanced dataset', 'Model is Skewed', etc. Let's say we are training a model to detect spam emails.


Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars

Hosenie, Zafiirah, Lyon, Robert, Stappers, Benjamin, Mootoovaloo, Arrykrishna

arXiv.org Machine Learning

Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Surveys (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use Information Theory for feature selection and evaluation. We apply three Machine Learning (ML) algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS dataset, we find that the Random Forest (RF) classifier performs best in terms of balanced-accuracy and geometric means. We demonstrate substantially improved classification results by converting the multi-class problem into a binary classification task, achieving a balanced-accuracy rate of $\sim$99 per cent for the classification of ${\delta}$-Scuti and Anomalous Cepheids (ACEP). Additionally, we describe how classification performance can be improved via converting a 'flat-multi-class' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.


Machine Learning Classification with Python for Direct Marketing

#artificialintelligence

How to make business more time-efficient, slash costs and drive up sales? The question is timeless but not rhetorical. In the next few minutes of your reading time, I will apply a few classification algorithms to demonstrate how the use of the data analytic approach can contribute to that end. Together we'll create a predictive model that will help us customise the client databases we hand over to the telemarketing team so that they could concentrate resources on more promising clients first. On course to that, we'll perform a number of actions on the dataset.


Machine Learning Classification: A Dataset-based Pictorial

#artificialintelligence

The concept of classification in machine learning is concerned with building a model that separates data into distinct classes. This model is built by inputting a set of training data for which the classes are pre-labeled in order for the algorithm to learn from. The model is then used by inputting a different dataset for which the classes are withheld, allowing the model to predict their class membership based on what it has learned from the training set. Well-known classification schemes include decision trees and Support Vector Machines, among a whole host of others. As this type of algorithm requires explicit class labeling, classification is a form of supervised learning.